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9.1.0 Bayesian Inference 

The following is a general setup for a statistical inference problem: There is an unknown 
quantity that we would like to estimate. We get some data. From the data, we estimate the 
desired quantity. In the previous chapter, we discussed the frequentist approach to this 
problem. In that approach, the unknown quantity 9 is assumed to be a fixed (non-random) 
quantity that is to be estimated by the observed data. 

In this chapter, we would like to discuss a different framework for inference, namely the 
Bayesian approach. In the Bayesian framework, we treat the unknown quantity, 0, as a 
random variable. More specifically, we assume that we have some initial guess about the 
distribution of ©. This distribution is called the prior distribution. After observing some data, 
we update the distribution of 0 (based on the observed data). This step is usually done using 
Bayes' Rule. That is why this approach is called the Bayesian approach. The details of this 
approach will be clearer as you go through the chapter. Here, to motivate the Bayesian 
approach, we will provide two examples of statistical problems that might be solved using the 
Bayesian approach. 


Example 9.1 

Suppose that you would like to estimate the portion of voters in your town that plan to vote for 
Party A in an upcoming election. To do so, you take a random sample of size TL from the likely 
voters in the town. Since you have a limited amount of time and resources, your sample is 
relatively small. Specifically, suppose that 71 — 20 . After doing your sampling, you find out 
that 6 people in your sample say they will vote for Party A. 

• Solution 

o Let 9 be the true portion of voters in your town who plan to vote for Party A. You 
might want to estimate 9 as 


- 6 

9 — - = 0.3 

20 


In fact, in absence of any other data, that seems to be a reasonable estimate. 
However, you might feel that 71 — 20 is too small. Thus, your guess is that the 
error in your estimation might be too high. While thinking about this problem, 
you remember that the data from the previous election is available to you. You 
look at that data and find out that, in the previous election, 40 % of the people in 
your town voted for Party A. How can you use this data to possibly improve your 
estimate of 9l You might argue as follows: 
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Although the portion of votes for Party A changes from one election to another, 
the change is not usually very drastic. Therefore, given that in the previous 
election 40% of the voters voted for Party A, you might want to model the 
portion of votes for Party A in the next election as a random variable 0 with a 
probability density function, /q (#), that is mostly concentrated around 
0 = 0.4 . For example, you might want to choose the density such that 

E[Q] = 0.4 

Figure 9.1 shows an example of such density functions. Such a distribution shows 
your prior belief about 0 in the absence of any additional data. That is, before 
taking your random sample of size 71 — 20, this is your guess about the 
distribution of 0. 



Figure 9.1 - An example of a prior distribution for 0 in Example 9.1 

Therefore, you initially have the prior distribution /© (#). Then you collect some 
data, shown by D. More specifically, here your data is a random sample of size 
n — 20 voters, 6 of whom are voting for Party A. As we will discuss in more 
detail, you can then proceed to find an updated distribution for 0, called the 
posterior distribution, using Bayes' rule: 


/e \d(6\D) 


P(D\0)f e (9) 

P(D) 


We can now use the posterior density, fQ\]j(0\D), to further draw inferences 
about 0. More specifically, we might use it to find point or interval estimates of 
0 . 


Example 9.2 

Consider a communication channel as shown in Figure 9.2. We can model the communication 
over this channel as follows. At time 71 , a random variable X n is generated and is transmitted 
over the channel. However, the channel is noisy. Thus, at the receiver, a noisy version of X n 
is received. More specifically, the received signal is 

Y n = X n + W n , 
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where W n ~ iV(0,cr 2 ) is the noise added to X n . We assume that the receiver knows the 
distribution of X n . The goal here is to recover (estimate) the value of X n based on the 
observed value of Y n . 


Noise: W n ~ N(0,a 2 ) 


X n 


Transmitted Signal 



Y n = X n + W n 


Received Signal 


Communication Channel 

Figure 9.2 - Noisy communication channel in Example 9.2 


• Solution 

o Again, we are dealing with estimating a random variable ( X n ). In this case, the 
prior distribution is fx (tc) ■ After observing Y n , the posterior distribution can 
be written as 


fx n \Y n ( x \y) 


fr(y) 


Here, we have assumed both X and Y are continuous random variables. The 
above formula is a version of Bayes' rule. We will discuss the details of this 
approach shortly; however, as you'll notice, we are using the same framework as 
Example 9.1 . After finding the posterior distribution, fx n \Y n {p c \y\ we can then 
use it to estimate the value of X n . 


If you think about Examples 9J_ and 92. carefully, you will notice that they have similar 
structures. Basically, in both problems, our goal is to draw an inference about the value of an 
unobserved random variable (0 or X n ). We observe some data (D or Y n ). We then use 
Bayes' rule to make inference about the unobserved random variable. This is generally how we 
approach inference problems in Bayesian statistics. 

It is worth noting that Examples 9J_ and 92 are conceptually different in the following sense: 
In Example 9J_ , the choice of prior distribution /@ (#) is somewhat unclear. That is, different 
people might use different prior distributions. In other words, the choice of prior distribution is 
subjective here. On the other hand, in Example 9.2 . the prior distribution fx (%) might be 
determined as a part of the communication system design. In other words, for this example, 
the prior distribution might be known without any ambiguity. Nevertheless, once the prior 
distribution is detennined, then one uses similar methods to attack both problems. For this 
reason, we study both problems under the umbrella of Bayesian statistics. 
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Bayesian Statistical Inference 

The goal is to draw inferences about an unknown variable X by observing a related random 
variable Y. The unknown variable is modeled as a random variable X, with prior distribution 

fx(x), if X is continuous, 

Px(x), if X is discrete. 

After observing the value of the random variable Y, we find the posterior distribution of X. 
This is the conditional PDF (or PMF) of X given Y — y, 

fx\y(x\y) or Px\y(x\v)- 

The posterior distribution is usually found using Bayes' formula. Using the posterior 
distribution, we can then find point or interval estimates of X. 

Note that in the above setting, X or Y (or possibly both) could be random vectors. For 
example, X — (X\, X 2 , • • • , X n ) might consist of several random variables. However, 
the general idea of Bayesian statistics stays the same. We will specifically talk about 
estimating random vectors in Section 9.1.7 . 
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